Conversation
This comment was marked as outdated.
This comment was marked as outdated.
3111318 to
675718d
Compare
This comment was marked as outdated.
This comment was marked as outdated.
675718d to
d4e849f
Compare
|
Most changes to Python require a NEWS entry. Please add it using the blurb_it web app or the blurb command-line tool. |
ef5eb69 to
b82fa85
Compare
6998be1 to
be39ff2
Compare
|
@AlexWaygood You’ve been the only human to interact with this PR so far, do you possibly have any advice on how to move this forward? |
|
Hi @marcelm — sorry for the delay in anybody looking at this. I haven't studied your PR in detail (or thought about whether the proposal is a good idea), but it looks well put together at first glance. I'll try to take a look soon. I won't be able to review the C code, but I can comment on whether the proposal seems like a good idea, and I can review the Python implementation and the tests. Note that if this proposal is accepted, it will also need:
You can also add yourself to |
|
Thank you! I pushed a documentation update and will add an entry to the What’s new document in case the PR is reviewed favorably. |
|
I'll defer to @benjaminp's judgement on this one -- I'm really not the right reviewer for this, unfortunately :( Please ping me again if you still haven't had a review in a few weeks. |
|
Thank you! I appreciate that you took the time. |
@AlexWaygood Hi! Been 8 months, no review so far :) How can we take this forward? I'm not the original author of the PR but I also have a use-case for this API and would like to see this change land, happy to help progress this. |
|
Thanks for your interest! I have updated the PR to fix the merge conflict and to reflect that it now needs to target 3.13. |
|
@marcelm That was quick, thank you so much! |
5cb3143 to
79d5032
Compare
|
cc: @erlend-aasland @vstinner @benjaminp for the A |
| .. versionadded:: 3.13 | ||
|
|
||
| Return bytes from the current position onwards but without advancing the | ||
| position. The number of bytes returned may be less or more than requested. |
There was a problem hiding this comment.
Wait. Why can it more than requested?
Apparently size=0 means "read all". Please document it.
There was a problem hiding this comment.
This was just copied from the documentation for BufferedReader.peek(), but I agree that it could be written better. Changed now.
Wait. Why can it more than requested?
BufferedReader.peek() ignores the size argument and just returns whatever it has in its internal buffer.
There was a problem hiding this comment.
I would prefer that BytesIO.peek() documents its behavior, rather than BufferedReader.peek() behavior.
There was a problem hiding this comment.
Absolutely, it’s already changed.
(BTW, is there a policy on who clicks the "Resolve conversation" button?)
| pos = self.tell() | ||
| if size == 0: | ||
| size = -1 | ||
| b = self.read(size) |
There was a problem hiding this comment.
Would it be possible to implement it without touching the position? This code is not thread safe. I don't know if it's supposed to be thread-safe. Maybe add a private read method which has an argument to decide to move the position or not.
Same remark for C code.
There was a problem hiding this comment.
Good point; I implemented this now so that the position is not changed. I factored out a peek_bytes function in the C version that does not advance the position.
It seems though that neither the C nor the Python version of BytesIO are supposed to be thread safe. (They don’t use locks as e.g. BufferedReader does.) So I would suggest making them so would be a task for a different PR.
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
611bdaa to
77e04d6
Compare
vstinner
left a comment
There was a problem hiding this comment.
I asked BytesIO.peek(0) to return an empty string, but I didn't notice that BufferedReader.peek(0) returns non-empty string. Now I'm unhappy that the two classes have an inconsistent behavior on peek(0), sorry.
>>> f=open("/etc/issue", "rb")
>>> f.peek(0)
b'\\S\nKernel \\r on \\m (\\l)\n\n'
On a short file, BufferedReader.peek(0) even returned the whole file contents!
Since BufferedReader.peek(0) behavior already exists, I would suggest to use the same behavior on BytesIO.peek(): return non-empty string.
cc @cmaloney
Co-authored-by: Victor Stinner <vstinner@python.org>
Sure! I have changed it so that the rest of the buffer is returned for The signature is still such that the default is
$ python3 -c 'f=open("README.rst", "rb");f.read(8191);print(f.peek())'
b'r'
Alternatively, the default could become |
|
I have looked into the test failures on free-threaded Python but don’t know what is going on. The |
You must decorate diff --git a/Modules/_io/bytesio.c b/Modules/_io/bytesio.c
index f43cf1f7e41..98f31f3d1a8 100644
--- a/Modules/_io/bytesio.c
+++ b/Modules/_io/bytesio.c
@@ -511,6 +511,7 @@ _io_BytesIO_read1_impl(bytesio *self, Py_ssize_t size)
/*[clinic input]
+@critical_section
_io.BytesIO.peek
size: Py_ssize_t = 1
/ |
Thanks, that worked! |
|
Is anything else needed? As far as I understand, this PR would need to be merged before beta 1 in two weeks in order to get this into 3.15. |
|
|
||
| Return :class:`bytes` containing the entire contents of the buffer. | ||
|
|
||
| .. method:: peek(size=1, /) |
There was a problem hiding this comment.
Why not default to size=0 like BufferedReader?
There was a problem hiding this comment.
Read through the comments. In general anything returning a bytes rather than a bytes-like memoryview is going to require allocating and copying potentially many bytes... If the copy is really concerning I'd lean returning a memoryview rather than a bytes which is a mandatory copy.
There was a problem hiding this comment.
memoryview came up in this comment and the three following it: #30808 (comment)
I don’t know what the conclusion is given your comment. Should a memoryview be returned instead? Most important to me is compatibility with what BufferedReader.peek() returns.
I am not too concerned about the extra memory for a bytes object as long as the default is size=1.
There was a problem hiding this comment.
👍 to not using memoryview / definitely a good rationale in that thread.
On 3.14.0 BufferedReader (intentionally limiting buffer size) gives:
Python 3.14.0 (v3.14.0:ebf955df7a8, Oct 7 2025, 08:20:14) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> br = open('README.rst', 'rb', buffering=30)
>>> def show_start_end(buf):
... print(f'{len(buf)=}|{buf=}')
...
>>> show_start_end(br.peek())
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(0))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(1))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> _ = br.read(len(br.peek()))
>>> show_start_end(br.peek())
len(buf)=30|buf=b'alpha 7\n======================'Which doesn't match the behavior here or defaults. That in particular concerns me because it seems like people will build and test I/O stack pieces (ex. file parsing) expecting the .peek behavior of BytesIO then get something different when they read actual files/data...
Looking for more alternatives, the documentation page says (https://docs.python.org/3/library/io.html#io.BufferedReader.peek):
Return bytes from the stream without advancing the position. The number of bytes returned may be less or more than requested. If the underlying raw stream is non-blocking and the operation would block, returns empty bytes.
To me that leaves an intentional gap where we can always return less data than the total amount available. Peek gives no guarantees. That makes me wonder: Could we default to 0 and just always slice [:DEFAULT_BUFFER_SIZE]?
That would mean you always get DEFAULT_BUFFER_SIZE which matches default BufferedReader behavior unless there is less data available. If called in a loop yes that's a DEFAULT_BUFFER_SIZE repeated copy (BufferedIO also does that / the bytes requires a copy), but it's a lot less than "the whole buffer".
|
Also: Please extend the free threading test in |
|
Overall looking good, definitely think this is in good shape. Just some weird corners (I've spent too much time reading I/O code...) |
marcelm
left a comment
There was a problem hiding this comment.
Thanks for the comments! I tried to address them.
Also: Please extend the free threading test in Lib/test/test_free_threading/test_io.py to cover the new method
I have added something there, but wasn’t quite sure what I was doing; can you please have a look?
|
|
||
| Return :class:`bytes` containing the entire contents of the buffer. | ||
|
|
||
| .. method:: peek(size=1, /) |
There was a problem hiding this comment.
memoryview came up in this comment and the three following it: #30808 (comment)
I don’t know what the conclusion is given your comment. Should a memoryview be returned instead? Most important to me is compatibility with what BufferedReader.peek() returns.
I am not too concerned about the extra memory for a bytes object as long as the default is size=1.
|
|
||
| Return :class:`bytes` containing the entire contents of the buffer. | ||
|
|
||
| .. method:: peek(size=1, /) |
There was a problem hiding this comment.
👍 to not using memoryview / definitely a good rationale in that thread.
On 3.14.0 BufferedReader (intentionally limiting buffer size) gives:
Python 3.14.0 (v3.14.0:ebf955df7a8, Oct 7 2025, 08:20:14) [Clang 16.0.0 (clang-1600.0.26.6)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> br = open('README.rst', 'rb', buffering=30)
>>> def show_start_end(buf):
... print(f'{len(buf)=}|{buf=}')
...
>>> show_start_end(br.peek())
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(0))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> show_start_end(br.peek(1))
len(buf)=30|buf=b'This is Python version 3.15.0 '
>>> _ = br.read(len(br.peek()))
>>> show_start_end(br.peek())
len(buf)=30|buf=b'alpha 7\n======================'Which doesn't match the behavior here or defaults. That in particular concerns me because it seems like people will build and test I/O stack pieces (ex. file parsing) expecting the .peek behavior of BytesIO then get something different when they read actual files/data...
Looking for more alternatives, the documentation page says (https://docs.python.org/3/library/io.html#io.BufferedReader.peek):
Return bytes from the stream without advancing the position. The number of bytes returned may be less or more than requested. If the underlying raw stream is non-blocking and the operation would block, returns empty bytes.
To me that leaves an intentional gap where we can always return less data than the total amount available. Peek gives no guarantees. That makes me wonder: Could we default to 0 and just always slice [:DEFAULT_BUFFER_SIZE]?
That would mean you always get DEFAULT_BUFFER_SIZE which matches default BufferedReader behavior unless there is less data available. If called in a loop yes that's a DEFAULT_BUFFER_SIZE repeated copy (BufferedIO also does that / the bytes requires a copy), but it's a lot less than "the whole buffer".
| barrier.wait() | ||
| b.readinto(into) | ||
|
|
||
| def peek(barrier, b, into, *ignore): |
There was a problem hiding this comment.
Need a bit more than this. These helpers define "how do we make the many threads race as much as possible" / help synchornize their start.
This needs to also get added to the self.check section below which sets "this is the set of operations we're going to have race".
So self.check([truncate] + [read] * 10, self.ioclass(b'0\n'*204800)) for example tests what happens if a bunch of truncate and read calls race with eachtoher.
Closes gh-90533